21 research outputs found

    Design, development and field evaluation of a Spanish into sign language translation system

    Get PDF
    This paper describes the design, development and field evaluation of a machine translation system from Spanish to Spanish Sign Language (LSE: Lengua de Signos Española). The developed system focuses on helping Deaf people when they want to renew their Driver’s License. The system is made up of a speech recognizer (for decoding the spoken utterance into a word sequence), a natural language translator (for converting a word sequence into a sequence of signs belonging to the sign language), and a 3D avatar animation module (for playing back the signs). For the natural language translator, three technological approaches have been implemented and evaluated: an example-based strategy, a rule-based translation method and a statistical translator. For the final version, the implemented language translator combines all the alternatives into a hierarchical structure. This paper includes a detailed description of the field evaluation. This evaluation was carried out in the Local Traffic Office in Toledo involving real government employees and Deaf people. The evaluation includes objective measurements from the system and subjective information from questionnaires. The paper details the main problems found and a discussion on how to solve them (some of them specific for LSE)

    Topic Identification based on Bayesian Belief Networks in the context of an Air Traffic Control Task

    No full text
    Resumen: En este artículo presentamos una tarea de identificación de tópico basada en Redes Bayesianas. Estas redes son entrenadas a partir de los conceptos semánticos que se han etiquetado para cada frase a procesar y que han sido definidos por un experto en el dominio de aplicación. Los tópicos a identificar se corresponden con las cinco posiciones de control disponibles en un aeropuerto. Se ha llevado a cabo una evaluación basada en bloques de frases. Obtenemos una tasa de error de identificación de bloque del 3.5 % para un esquema de evaluación ‘winner takes all ’ usando un tamaño de 5 frases por bloque. Finalmente, comparamos los resultados obtenidos con una estrategia basada en un clasificador Bayesiano para el que tomamos como vector de parámetros las perplejidades resultantes de aplicar un modelo de lenguaje de tipo trigrama específico para cada uno de los tópicos. Los resultados obtenidos demuestran la importancia de considerar el orden de aparición de la información y la necesidad de incluirla en las Redes Bayesianas en futuros trabajos. Palabras clave: Identificación de Tópico, Redes Bayesianas, N-gram, Control Tráfico Aéreo. Abstract: In this paper we present a topic identification task based on a Bayesian Belief Network approach. These networks are trained with a number of semantic concepts which hav

    LOW-RESOURCE LANGUAGE RECOGNITION USING A FUSION OF PHONEME POSTERIORGRAM COUNTS, ACOUSTIC AND GLOTTAL-BASED I-VECTORS

    No full text
    This paper presents a description of our system for the Albayzin 2012 LRE competition. One of the main characteristics of this evaluation was the reduced number of available files for training the system, especially for the empty condition where no training data set was provided but only a development set. In addition, the whole database was created from online videos and around one third of the training data was labeled as noisy files. Our primary system was the fusion of three different i-vector based systems: one acoustic system based on MFCCs, a phonotactic system using trigrams of phone-posteriorgram counts, and another acoustic system based on RPLPs that improved robustness against noise. A contrastive system that included new features based on the glottal source was also presented. Official and postevaluation results for all the conditions using the proposed metrics for the evaluation and the Cavg metric are presented in the paper. Index Terms—LID system, noise robustness, scarce data, posteriorgram counts, i-vectors 1

    INTERSPEECH 2007 Language Identification based on n-gram Frequency Ranking

    No full text
    We present a novel approach for language identification based on a text categorization technique, namely an n-gram frequency ranking. We use a Parallel phone recognizer, the same as in PPRLM, but instead of the language model, we create a ranking with the most frequent n-grams, keeping only a fraction of them. Then we compute the distance between the input sentence ranking and each language ranking, based on the difference in relative positions for each n-gram. The objective of this ranking is to be able to model reliably a longer span than PPRLM, namely 5-gram instead of trigram, because this ranking will need less training data for a reliable estimation. We demonstrate that this approach overcomes PPRLM (6 % relative improvement) due to the inclusion of 4gram and 5-gram in the classifier. We present two alternatives: ranking with absolute values for the number of occurrences and ranking with discriminative values (11% relative improvement). Index Terms: Language Identification, n-gram frequency ranking, text categorization, PPRL

    Language Identification based on n-gram Frequency Ranking

    No full text
    We present a novel approach for language identification based on a text categorization technique, namely an n-gram frequency ranking. We use a Parallel phone recognizer, the same as in PPRLM, but instead of the language model, we create a ranking with the most frequent n-grams, keeping only a fraction of them. Then we compute the distance between the input sentence ranking and each language ranking, based on the difference in relative positions for each n-gram. The objective of this ranking is to be able to model reliably a longer span than PPRLM, namely 5-gram instead of trigram, because this ranking will need less training data for a reliable estimation. We demonstrate that this approach overcomes PPRLM (6 % relative improvement) thanks to the inclusion of 4-gram and 5-gram in the classifier. We present two alternatives: ranking with absolute values for the number of occurrences and ranking with discriminative values (11% relative improvement). Index Terms: Language Identification, n-gram frequency ranking, text categorization, PPRL

    Language Identification using several sources of information with a multiple-Gaussian classifier

    No full text
    We present several innovative techniques that can be applied in a PPRLM system for language identification (LID). To normalize the scores, eliminate the bias in the scores and improve the classifier, we compared the bias removal technique (up to 19 % relative improvement (RI)) and a Gaussian classifier (up to 37 % RI). Then, we include additional sources of information in different feature vectors of the Gaussian classifier: the sentence acoustic score (11% RI), the average acoustic score for each phoneme (11 % RI), and the average duration for each phoneme (7.8 % RI). The use of a multiple-Gaussian classifier with 4 feature vectors meant an additional 15.1 % RI. Using 4 feature vectors instead of just PPRLM provides a 26.1 % RI. Finally, we include additional acoustic HMMs of the same language with success (10 % relative improvement). We will show how all these improvements have been mostly additive

    A PROPOSAL OF METRICS FOR DETAILED EVALUATION IN PRONUNCIATION MODELING

    No full text
    In the context of ASR systems it is of major importance to accurately model the allophonic variations to be faced in a real world task. The evaluation of which pronunciation variants are actually improving the system performance is crucial, as it determines the acceptance of the pronunciation alternatives used. Traditional approaches use different criteria and, typically, evaluation only cares about the global impact of the augmented dictionaries in the WER, so that this leads to little further insight on till what extent the proposed variations are actually working or not. Our proposal in this paper is also evaluating the effective improvement due to every pronunciation variation used, defining specific improvement metrics on the utterance level. We will show how these metrics actually highlight the beneficial impact achieved by the application of phonological rules when dealing with certain pronunciations variants, while the differences observed in global WER are not statistically significant. 1
    corecore